A comparison of traditional and rough set approaches to missing attribute values in data mining
نویسنده
چکیده
Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values and rule induction are conducted concurrently. In our experiments four traditional methods of handling missing attribute values were applied: Most Common Value, Concept Most Common Value, Closest Fit, and Concept Closest Fit. Both Closest Fit methods were enhanced by a rough set approach to missing attribute values. On the same typical data sets experiments were conducted using three different rough-set interpretations of missing attribute values: lost values, “do not care” conditions and attribute-concept values using the MLEM2 rule induction algorithm, based on rough set theory. The best method is the Concept Closest Fit enhanced by interpreting remaining missing attribute values as lost values.
منابع مشابه
Mining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches
In this paper, we study probabilistic and rough set approaches to missing attribute values. Probabilistic approaches are based on imputation, a missing attribute value is replaced either by the most probable known attribute value or by the most probable attribute value restricted to a concept. In this paper, in a rough set approach to missing attribute values we consider two interpretations of ...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملA Comparison of Several Approaches to Missing Attribute Values in Data Mining
In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error r...
متن کاملA Comparative Study on Decision Rule Induction for incomplete data using Rough Set and Random Tree Approaches
Handling missing attribute values is the greatest challenging process in data analysis. There are so many approaches that can be adopted to handle the missing attributes. In this paper, a comparative analysis is made of an incomplete dataset for future prediction using rough set approach and random tree generation in data mining. The result of simple classification technique (using random tree ...
متن کاملThree Approaches to Missing Attribute Values: A Rough Set Perspective
A new approach to missing attribute values, based on the idea of an attribute-concept value, is studied in the paper. This approach, together with two other approaches to missing attribute values, based on "do not care" conditions and lost values are discussed using rough set methodology, including attribute-value pair blocks, characteristic sets, and characteristic relations. Characteristic se...
متن کامل